Word Stemming for Arabic Information Retrieval: The Case for Simple Light Stemming
نویسنده
چکیده
Although a number of attempts have been made to develop a stemming formalism for the Arabic language, most of these attempts have focused merely on the lexical structure of words as modeled by the Arabic grammatical and morphological lexical rules. This paper discusses the merits of light stemming for Arabic data and presents a simple light stemming strategy that has been developed on the basis of an analysis of actual occurrence of suffixes and prefixes in real texts. The performance of this stemming strategy has been compared with that of a heavier stemming strategy that takes into consideration most grammatical prefixes and suffixes. The results indicate that only a few of the prefixes and suffixes have an impact on the correctness of stems generated. Light stemming has exhibited superior performance than heavy stemming in terms of over-stemming and under-stemming measures. It has been shown that the two stemming strategies are significantly different in retrieval performance.
منابع مشابه
Effective Stemming for Arabic Information Retrieval
Arabic has a very rich and complex morphology. Its appropriate morphological processing is very important for Information Retrieval (IR). In this paper, we propose a new stemming technique that tries to determine the stem of a word representing the semantic core of this word according to Arabic morphology. This method is compared to a commonly used light stemming technique which truncates a wor...
متن کاملبررسی تأثیرات ریشهیابی در بازیابی اطلاعات در زبان فارسی
Using the language-specific behavior in information retrieval systems can improve the quality of the retrieved results significantly. Part of the word that remains after removing its affixes is called stem. Stemming process can be used for improving the relevancy of the results in information retrieval system. Different morphological variants of words (plural, past tense…) will be mapped into t...
متن کاملThe Enhancement of Arabic Stemming by Using Light Stemming and Dictionary-Based Stemming
Word stemming is one of the most important factors that affect the performance of many natural language processing applications such as part of speech tagging, syntactic parsing, machine translation system and information retrieval systems. Computational stemming is an urgent problem for Arabic Natural Language Processing, because Arabic is a highly inflected language. The existing stemmers hav...
متن کاملArabic Light Stemmer: Anew Enhanced Approach
In general, word stemming is one of the most important factors that affect the performance of information retrieval systems. The optimization issues of Arabic light stemming algorithm as a main component in natural language processing and information retrieval for Arabic language are based on root-pattern schemes. Since Arabic language is a highly inflected language and has a complex morphologi...
متن کاملLight Stemming for Arabic Information Retrieval
Computational Morphology is an urgent problem for Arabic Natural Language Processing, because Arabic is a highly inflected language. We have found, however, that a full solution to this problem is not required for effective information retrieval. Light stemming allows remarkably good information retrieval without providing correct morphological analyses. We developed several light stemmers for ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013